Danny Kaplan
Danny Kaplan
Macalester College
Bridges
Roommates
K = for kitchen, LR = for living room
Not this …
| patient | accupuncture | pain | date | technician |
|---|---|---|---|---|
| A2322 | control | yes | 2014-03-15 | Audrey |
| A2397 | treatment | yes | 2014-03-17 | Audrey |
| A3213 | treatment | no | 2014-03-17 | Bill |
| B8732 | treatment | no | 2014-03-18 | Audrey |
| C6920 | control | yes | 2014-03-18 | Bill |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
NHANES %>%
gf_point(Height ~ Age | Gender, color = ~ Gender, alpha = 0.1)height(Age = 25, Gender = "female")## Gender Age model_output
## 1 female 25 164.8456
height(Age = 3:80, Gender = c("female", "male")) %>%
gf_line(model_output ~ Age | Gender)Model Training: tools for building functions that look like your data
hmod1 <- lm(Height ~ Gender * ns(Age, 5), data = NHANES) ns() == “not straight”?)wmod1 <- glm(outcome == "Dead" ~ smoker, data = Whickham, family = "binomial")
mod_effect(wmod1, ~ smoker, age = c(40, 50, 60))## change smoker to:smoker
## 1 -0.07537604 No Yes
wmod2 <- glm(outcome == "Dead" ~ smoker + age, data = Whickham, family = "binomial")
mod_effect(wmod2, ~ smoker, age = c(40, 50, 60))## change smoker to:smoker age
## 1 0.01377155 No Yes 40
## 2 0.03419996 No Yes 50
## 3 0.05105680 No Yes 60
NHANES %>% df_stats(Height ~ Gender, coverage(0.95))## Gender lower upper
## 1 female 102.43 176.1
## 2 male 100.90 190.3
NHANES %>%
gf_jitter(Height ~ Gender, alpha = 0.05, width = 0.15) %>%
gf_violin(alpha = 0.3, fill = ~ Gender, color = NA)hmod_ensemble <- mod_ensemble(hmod1, nreps = 4)
mod_effect(hmod_ensemble, ~ Age, Age = 5, step = 1,
Gender = c("male", "female")) %>%
arrange(Gender)## slope Age to:Age Gender bootstrap_rep
## 1 6.442775 5 6 female 1
## 2 6.394813 5 6 female 2
## 3 6.415638 5 6 female 3
## 4 6.396559 5 6 female 4
## 5 7.254420 5 6 male 1
## 6 7.258021 5 6 male 2
## 7 7.248615 5 6 male 3
## 8 7.233795 5 6 male 4
mod_eval(hmod1, data = NHANES) %>%
df_stats( ~ I((model_output - Height)^2), mean)## [1] 52.14731
Or make it a fundamental operation.
mod_error(hmod1, testdata = NHANES)## [1] 52.14731
Let’s try another model that’s more flexile
hmod2 <- lm(Height ~ Gender * ns(Age, 25), data = NHANES) How does it compare to the original?
mod_cv(hmod1, hmod2, ntrials = 50) %>%
df_stats(mse ~ model, coverage(0.95))## model lower upper
## 1 hmod1 52.21514 52.30473
## 2 hmod2 49.72174 49.86336
e.g. the Judea Pearl award in causality education